Heterogeneous Data Integration with the Consensus Clustering Formalism
نویسندگان
چکیده
Meaningfully integrating massive multi-experimental genomic data sets is becoming critical for the understanding of gene function. We have recently proposed methodologies for integrating large numbers of microarray data sets based on consensus clustering. Our methods combine gene clusters into a unified representation, or a consensus, that is insensitive to mis-classifications in the individual experiments. Here we extend their utility to heterogeneous data sets and focus on their refinement and improvement. First of all we compare our best heuristic to the popular majority rule consensus clustering heuristic, and show that the former yields tighter consensuses. We propose a refinement to our consensus algorithm by clustering of the source-specific clusterings as a step before finding the consensus between them, thereby improving our original results and increasing their biological relevance. We demonstrate our methodology on three data sets of yeast with biologically interesting results. Finally, we show that our methodology can deal successfully with missing experimental values.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملIntegrating Microarray Data by Consensus Clustering
With the exploding volume of microarray experiments comes increasing interest in mining repositories of such data. Meaningfully combining results from varied experiments on an equal basis is a challenging task. Here we propose a general method for integrating heterogeneous data sets based on the consensus clustering formalism. Our method analyzes source-specific clusterings and identifies a con...
متن کاملA new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble
An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...
متن کاملA Robust Symmetric Nonnegative Matrix Factorization Framework for Clustering Multiple Heterogeneous Microbiome Data
Integration of multi-view datasets which are comprised of heterogeneous sources or different representations is challenging to understand the subtle and complex relationship in data. Such data integration methods attempt to combine efficiently the complementary information of multiple data types to construct a comprehensive view of underlying data. Nonnegative matrix factorization (NMF), an app...
متن کاملAdaptive Distributed Consensus Control for a Class of Heterogeneous and Uncertain Nonlinear Multi-Agent Systems
This paper has been devoted to the design of a distributed consensus control for a class of uncertain nonlinear multi-agent systems in the strict-feedback form. The communication between the agents has been described by a directed graph. Radial-basis function neural networks have been used for the approximation of the uncertain and heterogeneous dynamics of the followers as well as the effect o...
متن کامل